Characterizing Web Document Change
نویسندگان
چکیده
The World Wide Web is growing and changing at an astonishing rate. For the information in the web to be useful, web information systems such as search engines have to keep up with the growth and change of the web. In this paper we study how web documents change. In particular, we study two important characteristics of web document change that are directly related to keeping web information systems upto-date: the degree of the change and the clusteredness of the change. We analyze the evolution of web documents with respect to these two measures and discuss the implications for web information systems update.
منابع مشابه
An Ensemble Click Model for Web Document Ranking
Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...
متن کاملRRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features
Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...
متن کاملDynamics of information access on the web.
While current studies on complex networks focus on systems that change relatively slowly in time, the structure of the most visited regions of the web is altered at the time scale from hours to days. Here we investigate the dynamics of visitation of a major news portal, representing the prototype for such a rapidly evolving network. The nodes of the network can be classified into stable nodes, ...
متن کاملKeeping Web Indices up-to-date
Search engines play a crucial role in the Web. Without search engines large parts of the Web becomes inaccessible for the majority of users. Search engines can make new and smaller sites accessible at low cost. Without them, other media, such as Television, would be needed to advertise the existence new site on the Web, only large commercial sites can follow this path. The Web would be endanger...
متن کاملHierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics
This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001